Personal Name Resolution of Web People Search
نویسندگان
چکیده
Disambiguating personal names in a set of documents (such as a set of web pages returned in response to a person name) is a difficult and challenging task. In this paper, we explore the extent to which the “cluster hypothesis” for this task holds (i.e., that similar documents tend to represent the same person). We explore two clustering techniques which used either (1) term based matching (single pass clustering) or (2) semantic based matching (Probabilistic Latent Semantic Analysis). We compare and contrast these strategies and provide strong evidence to suggest that the hypothesis holds for the former. And in fact, on the new evaluation platform of the SemEval 2007 Web People Search task, we show that using single pass clustering with a standard IR document representations fits well with the assumptions about the data and the task which yields state-of-the-art performance.
منابع مشابه
Searching for people on Web search engines
The Web is a communication and information technology that is often used for the distribution and retrieval of personal information. Many people and organizations mount Web sites containing large amounts of information on individuals, particularly about celebrities. However, limited studies have examined how people search for information on other people, using personal names, via Web search eng...
متن کاملDisambiguating Personal Names on the Web Using Automatically Extracted Key Phrases
When you search for information regarding a particular person on the web, a search engine returns many pages. Some of these pages may be for people with the same name. How can we disambiguate these different people with the same name? This paper presents an unsupervised algorithm which produces unique phrases to disambiguate different people with the same name (i.e. namesakes). Our algorithm ta...
متن کاملCWePS: Chinese Web People Search
Name ambiguity is a big problem in personal information retrieval, especially given the explosive growth of Web data. In this demonstration, we present a prototype Chinese Web People Search system, called CWePS. Given a personal name as query, CWePS collects the top results from the existing search engines, and groups these returned pages into several clusters. Ideally, the Webpages in the same...
متن کاملImproving the performance of personal name disambiguation using web directories
Frequent requests from users to search engines on the World Wide Web are to search for information about people using personal names. Current search engines only return sets of documents containing the name queried, but, as several people usually share a personal name, the resulting sets often contain documents relevant to several people. It is necessary to disambiguate people in these result s...
متن کاملExtracting Key Phrases to Disambiguate Personal Names on the Web
When you search for information regarding a particular person on the web, a search engine returns many pages. Some of these pages may be for people with the same name. How can we disambiguate these different people with the same name? This paper presents an unsupervised algorithm which produces key phrases for the different people with the same name. These key phrases could be used to further n...
متن کاملAutomatic Annotation of Ambiguous Personal Names on the Web
Personal name disambiguation is an important task in social network extraction, evaluation and integration of ontologies, information retrieval, cross-document co-reference resolution and word sense disambiguation. We propose an unsupervised method to automatically annotate people with ambiguous names on the web using automatically extracted keywords. Given an ambiguous personal name, first, we...
متن کامل